169 research outputs found

    A Strategy analysis for genetic association studies with known inbreeding

    Get PDF
    Background: Association studies consist in identifying the genetic variants which are related to a specific disease through the use of statistical multiple hypothesis testing or segregation analysis in pedigrees. This type of studies has been very successful in the case of Mendelian monogenic disorders while it has been less successful in identifying genetic variants related to complex diseases where the insurgence depends on the interactions between different genes and the environment. The current technology allows to genotype more than a million of markers and this number has been rapidly increasing in the last years with the imputation based on templates sets and whole genome sequencing. This type of data introduces a great amount of noise in the statistical analysis and usually requires a great number of samples. Current methods seldom take into account gene-gene and gene-environment interactions which are fundamental especially in complex diseases. In this paper we propose to use a non-parametric additive model to detect the genetic variants related to diseases which accounts for interactions of unknown order. Although this is not new to the current literature, we show that in an isolated population, where the most related subjects share also most of their genetic code, the use of additive models may be improved if the available genealogical tree is taken into account. Specifically, we form a sample of cases and controls with the highest inbreeding by means of the Hungarian method, and estimate the set of genes/environmental variables, associated with the disease, by means of Random Forest. Results: We have evidence, from statistical theory, simulations and two applications, that we build a suitable procedure to eliminate stratification between cases and controls and that it also has enough precision in identifying genetic variants responsible for a disease. This procedure has been successfully used for the betathalassemia, which is a well known Mendelian disease, and also to the common asthma where we have identified candidate genes that underlie to the susceptibility of the asthma. Some of such candidate genes have been also found related to common asthma in the current literature. Conclusions: The data analysis approach, based on selecting the most related cases and controls along with the Random Forest model, is a powerful tool for detecting genetic variants associated to a disease in isolated populations. Moreover, this method provides also a prediction model that has accuracy in estimating the unknown disease status and that can be generally used to build kit tests for a wide class of Mendelian diseases

    MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads

    Get PDF
    Copy number variations (CNVs) are associated with many complex diseases. Next generation sequencing data enable one to identify precise CNV breakpoints to better under the underlying molecular mechanisms and to design more efficient assays. Using the CIGAR strings of the reads, we develop a method that can identify the exact CNV breakpoints, and in cases when the breakpoints are in a repeated region, the method reports a range where the breakpoints can slide. Our method identifies the breakpoints of a CNV using both the positions and CIGAR strings of the reads that cover breakpoints of a CNV. A read with a long soft clipped part (denoted as S in CIGAR) at its 3′(right) end can be used to identify the 5′(left)-side of the breakpoints, and a read with a long S part at the 5′ end can be used to identify the breakpoint at the 3′-side. To ensure both types of reads cover the same CNV, we require the overlapped common string to include both of the soft clipped parts. When a CNV starts and ends in the same repeated regions, its breakpoints are not unique, in which case our method reports the left most positions for the breakpoints and a range within which the breakpoints can be incremented without changing the variant sequence. We have implemented the methods in a C++ package intended for the current Illumina Miseq and Hiseq platforms for both whole genome and exon-sequencing. Our simulation studies have shown that our method compares favorably with other similar methods in terms of true discovery rate, false positive rate and breakpoint accuracy. Our results from a real application have shown that the detected CNVs are consistent with zygosity and read depth information. The software package is available at http://statgene.med.upenn.edu/softprog.html

    Microsatellites and SNPs linkage analysis in a Sardinian genetic isolate confirms several essential hypertension loci previously identified in different populations

    Get PDF
    Background. A multiplicity of study designs such as gene candidate analysis, genome wide search (GWS) and, recently, whole genome association studies have been employed for the identification of the genetic components of essential hypertension (EH). Several genome-wide linkage studies of EH and blood pressure-related phenotypes demonstrate that there is no single locus with a major effect while several genomic regions likely to contain EH-susceptibility loci were validated by multiple studies. Methods. We carried out the clinical assessment of the entire adult population in a Sardinian village (Talana) and we analyzed 16 selected families with 62 hypertensive subjects out of 267 individuals. We carried out a double GWS using a set of 902 uniformly spaced microsatellites and a high-density SNPs map on the same group of families. Results. Three loci were identified by both microsatellites and SNP scans and the obtained linkage results showed a remarkable degree of similarity. These loci were identified on chromosome 2q24, 11q23.1–25 and 13q14.11–21.33. Further support to these findings is their broad description present in literature associated to EH or related phenotypes. Bioinformatic investigation of these loci shows several potential EH candidate genes, several of whom already associated to blood pressure regulation pathways. Conclusion. Our search for major susceptibility EH genetic factors evidences that EH in the genetic isolate of Talana is due to the contribution of several genes contained in loci identified and replicated by earlier findings in different human populations

    Browsing Isolated Population Data

    Get PDF
    BACKGROUND: In our studies of genetically isolated populations in a remote mountain area in the center of Sardinia (Italy), we found that 80–85% of the inhabitants of each village belong to a single huge pedigree with families strictly connected to each other through hundreds of loops. Moreover, intermarriages between villages join pedigrees of different villages through links that make family trees even more complicated. Unfortunately, none of the commonly used pedigree drawing tools are able to draw the complete pedigree, whereas it is commonly accepted that the visual representation of families is very important as it helps researchers in identifying clusters of inherited traits and genotypes. We had a representation issue that compels researchers to work with subsets extracted from the overall genealogy, causing a serious loss of information on familiar relationships. To visually explore such complex pedigrees, we developed PedNavigator, a browser for genealogical databases properly suited for genetic studies. RESULTS: The PedNavigator is useful for genealogical research due to its capacity to represent family relations between persons and to make a visual verification of the links during family history reconstruction. As for genetic studies, it is helpful to follow propagation of a specific set of genetic markers (haplotype), or to select people for linkage analysis, showing relations between various branch of a family tree of affected subjects. AVAILABILITY: PedNavigator is an application integrated into a Framework designed to handle data for human genetic studies based on the Oracle platform. To allow the use of PedNavigator also to people not owning the same required informatics infrastructure or systems, we developed PedNavigator Lite with mainly the same features of the integrated one, based on MySQL database server. This version is free for academic users, and it is available for download from our sit

    Haplotype affinities resolve a major component of goat (<i>Capra hircus</i>) MtDNA D-loop diversity and reveal specific features of the Sardinian stock

    Get PDF
    Goat mtDNA haplogroup A is a poorly resolved lineage absorbing most of the overall diversity and is found in locations as distant as Eastern Asia and Southern Africa. Its phylogenetic dissection would cast light on an important portion of the spread of goat breeding. The aims of this work were 1) to provide an operational definition of meaningful mtDNA units within haplogroup A, 2) to investigate the mechanisms underlying the maintenance of diversity by considering the modes of selection operated by breeders and 3) to identify the peculiarities of Sardinian mtDNA types. We sequenced the mtDNA D-loop in a large sample of animals (1,591) which represents a non-trivial quota of the entire goat population of Sardinia. We found that Sardinia mirrors a large quota of mtDNA diversity of Western Eurasia in the number of variable sites, their mutational pattern and allele frequency. By using Bayesian analysis, a distance-based tree and a network analysis, we recognized demographically coherent groups of sequences identified by particular subsets of the variable positions. The results showed that this assignment system could be reproduced in other studies, capturing the greatest part of haplotype diversity. We identified haplotype groups overrepresented in Sardinian goats as a result of founder effects. We found that breeders maintain diversity of matrilines most likely through equalization of the reproductive potential. Moreover, the relevant amount of inter-farm mtDNA diversity found does not increase proportionally with distance. Our results illustrate the effects of breeding practices on the composition of maternal gene pool and identify mtDNA types that may be considered in projects aimed at retrieving the maternal component of the oldest breeds of Sardinia.</br

    Trophic niches of four sympatric rainforest anurans from southern Nigeria: does resource partitioning play a role in structuring the community ?

    Get PDF
    Le partage des ressources est un mécanisme qui peut réduire l'intensité de la compétition interspécifique dans un cortège d'espèces syntopiques, morphologiquement et éco-éthologiquement semblables. La documentation du partage des ressources, entre quatre espèces d'Anoures sympatriques, a été recherchée par l'examen du régime alimentaire (par dissection stomacale) de spécimens obtenus auprès de fournisseurs de viande de brousse dans le sud-est du Nigéria. Pour l'ensemble des quatres espèces, nous avons trouvé au total 32 différents types de proies. Ptychadena oxyrhynchus en a consommé 28, contre 17 pour P. aequiplicata, 15 pour Bufo maculatus et 10 seulement pour Hoplobatrachus occipitalis. Pour les courbes cumulatives de diversité des trois premières espèces un plateau a été atteint, montrant que la composition des régimes pouvait être considérée comme correctement établie. Les proies communes, consommées par les quatre espèces d'Anoures, étaient des Formicoidea, des Coléoptères adultes, des Aranéides, des Isopodes, des Oligochètes et des Pulmonés. Les proies communes, consommées par trois des quatre amphibiens, étaient des Dermaptères, des Hémiptères, des Odonates adultes et des Orthoptères. Les largeurs de tête variaient significativement entre les espèces d'Anoures ; toutefois, les deux espèces de Ptychadena ne montraient pas de différence significative entre elles sur ce point Pour trois espèces, la largeur de tête était significativement corrélée au volume de proies dans l'estomac. Le partage des ressources (en termes de types de proies) a été trouvé particulièrement net entre deux espèces de Ptychadena étroitement apparentées. La divergence entre Ptychadena oxyrhynchus et P. aequiplicata apparut telle que des analyses multivariées ont placé chacune d'elles, du point de vue alimentaire, plus près de Hoplobatrachus occipitalis ou de Bufo maculatus que de son congénère. Une divergence si forte semblerait jouer un rôle majeur dans le maintien de la structure de ce peuplement mixte d'Anoures.Resource partitioning is a mechanism that can reduce the intensity of inter-specifie competition between morphologically and eco-ethologically similar, syntopic species . Evidence for resource partitioning, between four syntopic anuran species, was investigated by examining the diet (through stomach dissection) of frogs bought from bush meat traders in southeastern Nigeria . Considering the four species together, a total of 32 different prey types were found. Ptychadena oxyrhynchus consumed 28 of them, while P. aequiplicata consumed 17 , Bufo maculatus 15 and Hoplobatrachus occipitalis only 10. For the first three species, the cumulative-diversity curves indicated that a plateau phase was reached, i .e . that the prey composition could be considered reliably assessed. Common prey items, which were consumed by all four anuran species, were : Formicoidea, Coleoptera adults, Araneidae, Jsopoda, Oligochaeta, and Pulmonata. Common prey items, which were consumed by three of the four amphibians, were : Dermaptera, Hemiptera, Odonata adults, and Orthoptera. Head width varied significantly between species, but there was no statistical difference between the two Ptychadena species. Head width was significantly correlated with prey volume in the stomach in each of three species. Resource partitioning (in terms of prey types) was found to be particularly strong between two closely related species of Ptychadena. The divergence bewteen Ptychadena oxyrhynchus and P. aequiplicata was such that multivariate analyses placed each one of them closer in feeding ecology to either Hoplobatrachus occipitalis or Bufo maculatus, than to their congener. Such strong divergence is hypothesized to play a major role in maintaining the structure of this mixed anuran community

    Genome-wide association analysis on normal hearing function identifies PCDH20 and SLC28A3 as candidates for hearing function and loss

    Get PDF
    Hearing loss and individual differences in normal hearing both have a substantial genetic basis. Although many new genes contributing to deafness have been identified, very little is known about genes/variants modulating the normal range of hearing ability. To fill this gap, we performed a two-stage meta-analysis on hearing thresholds (tested at 0.25, 0.5, 1, 2, 4, 8 kHz) and on pure-tone averages (low-, medium-and high-frequency thresholds grouped) in several isolated populations from Italy and Central Asia (total N = 2636). Here, we detected two genome-wide significant loci close to PCDH20 and SLC28A3 (top hits: rs78043697, P = 4.71E-10 and rs7032430, P = 2.39E-09, respectively). For both loci, we sought replication in two independent cohorts: B58C from the UK (N = 5892) and FITSA from Finland (N = 270). Both loci were successfully replicated at a nominal level of significance (P <0.05). In order to confirm our quantitative findings, we carried out RT-PCR and reported RNA-Seq data, which showed that both genes are expressed in mouse inner ear, especially in hair cells, further suggesting them as good candidates for modulatory genes in the auditory system. Sequencing data revealed no functional variants in the coding region of PCDH20 or SLC28A3, suggesting that variation in regulatory sequences may affect expression. Overall, these results contribute to a better understanding of the complex mechanisms underlying human hearing function.Peer reviewe

    Cancer incidence in Italian contaminated sites

    Get PDF
    Introduction. The incidence of cancer among residents in sites contaminated by pollutants with a possible health impact is not adequately studied. In Italy, SENTIERI Project  (Epidemiological study of residents in National Priority Contaminated Sites, NPCSs)  was implemented to study major health outcomes for residents in 44 NPCSs.Methods. The Italian Association of Cancer Registries (AIRTUM) records cancer incidence in 23 NPCSs. For each NPCSs, the incidence of all malignant cancers combined  and 35 cancer sites (coded according to ICD-10), was analysed (1996-2005). The observed cases were compared to the expected based on age (5-year period,18 classes),  gender, calendar period (1996-2000; 2001-2005), geographical area (North-Centre and  Centre-South) and cancer sites specific rates. Standardized Incidence Ratios (SIR) with  90% Confidence Intervals were computed.Results. In both genders an excess was observed for overall cancer incidence (9% in men  and 7% in women) as well as for specific cancer sites (colon and rectum, liver, gallbladder,  pancreas,  lung,  skin  melanoma,  bladder  and  Non  Hodgkin  lymphoma).  Deficits  were  observed  for  gastric  cancer  in  both  genders,  chronic  lymphoid  leukemia  (men),  malignant thyroid neoplasms, corpus uteri and connective and soft-tissue tumours and  sarcomas (women).Discussion. This report is, to our knowledge, the first one on cancer risk of residents in  NPCSs. The study, although not aiming to estimate the cancer burden attributable to  the environment as compared to occupation or life-style, supports the credibility of an  etiologic role of environmental exposures in contaminated sites. Ongoing analyses focus  on the interpretation of risk factors for excesses of specific cancer types overall and in  specific NPCSs in relation to the presence of carcinogenic pollutants.

    Application of a new method for GWAS in a related case/control sample with known pedigree structure: identification of new loci for nephrolithiasis

    Get PDF
    In contrast to large GWA studies based on thousands of individuals and large meta-analyses combining GWAS results, we analyzed a small case/control sample for uric acid nephrolithiasis. Our cohort of closely related individuals is derived from a small, genetically isolated village in Sardinia, with well-characterized genealogical data linking the extant population up to the 16(th) century. It is expected that the number of risk alleles involved in complex disorders is smaller in isolated founder populations than in more diverse populations, and the power to detect association with complex traits may be increased when related, homogeneous affected individuals are selected, as they are more likely to be enriched with and share specific risk variants than are unrelated, affected individuals from the general population. When related individuals are included in an association study, correlations among relatives must be accurately taken into account to ensure validity of the results. A recently proposed association method uses an empirical genotypic covariance matrix estimated from genome-screen data to allow for additional population structure and cryptic relatedness that may not be captured by the genealogical data. We apply the method to our data, and we also investigate the properties of the method, as well as other association methods, in our highly inbred population, as previous applications were to outbred samples. The more promising regions identified in our initial study in the genetic isolate were then further investigated in an independent sample collected from the Italian population. Among the loci that showed association in this study, we observed evidence of a possible involvement of the region encompassing the gene LRRC16A, already associated to serum uric acid levels in a large meta-analysis of 14 GWAS, suggesting that this locus might lead a pathway for uric acid metabolism that may be involved in gout as well as in nephrolithiasis

    High Differentiation among Eight Villages in a Secluded Area of Sardinia Revealed by Genome-Wide High Density SNPs Analysis

    Get PDF
    To better design association studies for complex traits in isolated populations it's important to understand how history and isolation moulded the genetic features of different communities. Population isolates should not “a priori” be considered homogeneous, even if the communities are not distant and part of a small region. We studied a particular area of Sardinia called Ogliastra, characterized by the presence of several distinct villages that display different history, immigration events and population size. Cultural and geographic isolation characterized the history of these communities. We determined LD parameters in 8 villages and defined population structure through high density SNPs (about 360 K) on 360 unrelated people (45 selected samples from each village). These isolates showed differences in LD values and LD map length. Five of these villages show high LD values probably due to their reduced population size and extreme isolation. High genetic differentiation among villages was detected. Moreover population structure analysis revealed a high correlation between genetic and geographic distances. Our study indicates that history, geography and biodemography have influenced the genetic features of Ogliastra communities producing differences in LD and population structure. All these data demonstrate that we can consider each village an isolate with specific characteristics. We suggest that, in order to optimize the study design of complex traits, a thorough characterization of genetic features is useful to identify the presence of sub-populations and stratification within genetic isolates
    • …
    corecore